-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
all: add 'copy' mount option #334
Conversation
This copies the content of the named composefs image into a sealed memfd and mounts that, instead of using the underlying file. The main benefit here is that the underlying filesystem isn't pinned by having one of its files used as the basis of the loopback device. This is useful if you want to embed a composefs into an initramfs, for example. There are also integrity benefits. Without this option, the fsverity of the file is verified at mount time, but nothing stops the file from being modified after the fact. With the copy option, we measure and compare the fsverity signature after copying into and sealing the memfd. Right now this is implemented as a straight copy, but we may also add support for decompression at some point. The erofs images compress reasonably well using general-purpose compression algorithms: a 52MB image of /usr on my system compresses down to 14MB with zstd and 13MB with xz.
This is not currently intended for merging, but might end up being part of the work we need to realize #332. |
The verity stuff is wrong here, mostly because I didn't understand how it worked: it's enforced by the filesystem and we just have to verify that the fingerprint is the one we expected. After we do that, we can't possibly read incorrect data. Also: memfd doesn't support it, which means we'll end up with Instead, if we really want to inspect the data in the memfd, we should probably use the userspace I'm not sure if there's a real benefit to pursuing this, but in any case, if we do want to allow mixing |
Right, we should be invoking |
I think at least historically some initramfs code basically solved this with |
I've not tested this change but I really like the approach. 👍🏻 |
BTW we hard require clang-format, it looks like e.g. |
@@ -198,6 +199,8 @@ int main(int argc, char **argv) | |||
opt_ro = false; | |||
} else if (strcmp("ro", key) == 0) { | |||
opt_ro = true; | |||
} else if (strcmp("copy", key) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to add this to the manpage and the usage output.
ssize_t n; | ||
do { | ||
char buffer[1 << 20]; // 1MB | ||
n = read(fd, buffer, sizeof buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to handle EINTR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have copy_file_data_classic()
n = read(fd, buffer, sizeof buffer); | ||
if (n < 0) | ||
break; | ||
n = write(memfd, buffer, n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to handle short writes, EINTR and write errors in general. I mean, i'm sure memfd writes don't get i/o errors, but you may be low on memory or whatever.
@@ -666,6 +668,35 @@ static errint_t lcfs_mount(struct lcfs_mount_state_s *state) | |||
return -EINVAL; | |||
} | |||
|
|||
static int lcfs_mount_copy_and_seal(int fd) { | |||
int memfd = memfd_create(CFS_MOUNT_SOURCE, MFD_CLOEXEC | MFD_ALLOW_SEALING); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use cleanup_fd int memfd = -1;:
here and then return steal_fd(&memfd)
; (Although i see we need to add a steal_fd for this.
@@ -680,7 +711,21 @@ int lcfs_mount_fd(int fd, const char *mountpoint, struct lcfs_mount_options_s *o | |||
return -1; | |||
} | |||
|
|||
int memfd = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleanup_fd
memfd = lcfs_mount_copy_and_seal(fd); | ||
if (memfd < 0) { | ||
return memfd; | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather skip the else clause in this kind of "didn't get error" codepath. I.e.:
if (memfd < 0) {
return memfd:
}
state.fd = memfd;
So, yeah, the integrity parts here are not correct, but I think this makes sense in some cases to eg. allow the source fs to be unmounted, but also as a general robustness thing. Like, its nice that we can guarantee "right data or EIO", but with this, we can guarantee "right data" (at least for the fs metadata), at the cost of slower mount and more memory use. |
I'm not sure how much this matters, but we can also with this get fs-verity validation if the file on disk doesn't have fs-verity (i.e. if it is on FAT or whatever), because we can manually valdiate the digest when copying into the memfd. Of course, we would still need fs-verity for the backing files. Again, not sure if this is practically useful, but worth considering. |
It's absolutely my plan to do the fsverity checking in userspace on the already-sealed memfd. |
...until we find some kernel hacker who is willing to add support for fsverity to tmpfs/memfd... |
I don't think this is necessary if we validated the digest on the fd after we opened it. We just copied data that was already validated (by the read call). How could it have become modified after that? I mean, in theory it could somehow become modified in memory after reading by some ptrace attacker, but that is also true for userspace validation after the fact (i.e. the code that checks the fsverity digest of the data read from the sealed fd could have been modified). Really, the security model is that mount.composefs comes from a signed trusted initrd, and runs only code that we trust, so there is no "attacker". |
Btw, I would do the fsverity manual computation in the copy-to-memfd loop, then you better parallelize the "read next block" and the "compute digest of block". |
I think someone could probably also open("/proc/($pidof mount.composefs)/fd/3") or whatever the memfd is and modify it like that. I'm not super-concerned about that case. Supporting filesystems that lack their own fsverity is more what I'm aiming for here, and I agree that this could be done while streaming the file from the disk... |
Not if it is sealed. Oh, or i guess you mean while its happening, similar to the ptrace attack. And yeah, that is not interesting. |
That said, you will still definitely need fs-verity for the filesystem backing the file objects. No way out of that. |
This much is clear. What we're imagining, though, is that the erofs layer lives in the initramfs, where it doesn't get fsverity. The digest store is on a "real" filesystem, though. |
This sounds very interesting. However, in that case the file is typically trusted by being on the initrd itself, which is signed, so fs-verity of it is not strictly necessary then. (In particular, because generally all fs-verity digests we verify against are generally trustworthy due to having the public key in the initrd.) |
Yes agreed, there is no need for us to enforce fsverity of the erofs in this model. Maybe we change Actually though...will Linux ever swap out a memfd contents? If it can, then we do need to enforce verity for it right? |
Digging a bit there's nowadays |
Of course, the other alternative design to think about here is basically: "don't put an EROFS in the initramfs, just put it unpacked in e.g. However, the swapping thing still seems like a concern to me...actually look, there's already a |
Does this work? For example, does the initramfs tar unpacking support full xattrs? Also, my guess is that a full in-memory tmpfs of the rootfs metadata will use a lot more ram than just the erofs image. |
I think we've decided not to do this. |
This copies the content of the named composefs image into a sealed memfd and mounts that, instead of using the underlying file.
The main benefit here is that the underlying filesystem isn't pinned by having one of its files used as the basis of the loopback device. This is useful if you want to embed a composefs into an initramfs, for example.
There are also integrity benefits. Without this option, the fsverity of the file is verified at mount time, but nothing stops the file from being modified after the fact. With the copy option, we measure and compare the fsverity signature after copying into and sealing the memfd.
Right now this is implemented as a straight copy, but we may also add support for decompression at some point. The erofs images compress reasonably well using general-purpose compression algorithms: a 52MB image of /usr on my system compresses down to 14MB with zstd and 13MB with xz.